Self Organizing Map(SOM) with Practical Implementation

您所在的位置：网站首页 › Clustering Activation Networks › Self Organizing Map(SOM) with Practical Implementation

Self Organizing Map(SOM) with Practical Implementation

2023-11-24 12:08| 来源: 网络整理| 查看: 265

Amir Ali

Published in

The Art of Data Scicne

·19 min read·May 26, 2019

In this Chapter of Deep Learning, we will discuss Self Organizing Maps (SOM). It is an Unsupervised Deep Learning technique and we will discuss both theoretical and Practical Implementation from Scratch.

This chapter spans 5 parts:

What is Self Organizing Maps?K-Mean Clustering Technique.SOMs Network Architecture.How Self Organizing Maps work.Practical Implementation of SOMs.1: What is Self Organization Maps?

The Self Organizing Map is one of the most popular neural models. It belongs to the category of the competitive learning network. The SOM is based on unsupervised learning, which means that is no human intervention is needed during the training and those little needs to be known about characterized by the input data. We could, for example, use the SOM for clustering membership of the input data. The SOM can be used to detect features inherent to the problem and thus has also been called SOFM the Self Origination Feature Map.

The Self Organized Map was developed by professor kohenen which is used in many applications.

the purpose of SOM is that it’s providing a data visualization technique that helps to understand high dimensional data by reducing the dimension of data to map. SOM also represents the clustering concept by grouping similar data together.

Therefore it can be said that Self Organizing Map reduces data dimension and displays similarly among data.

2. K-Mean Clustering Technique.2.1: What is k-Mean?

K-Means clustering aims to partition n observation into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

2.2: How k-Mean Cluster work?

The k-Means clustering algorithm attempt to split a given anonymous data set(a set of containing information as to class identity into a fixed number (k) of the cluster.

Initially, k number of the so-called centroid is chosen. A centroid is a data point (imaginary or real) at the center of the cluster.

Dataset Description:

This dataset has three attributes first is an item which is our target to make a cluster of similar items second and the third attribute is the informatics value of that item.

Now In the first step take any random row to let’s suppose I take row 1 and row 3.

Now take these above centroid values to compare with observing the value of the respected row of our data by using the Euclidean Distance formula.

Now let’s solve one by one

Row 1 (A)

Row 2 (B)

Row 3 (C )

Row 4 (D)

Row 5 (E)

Now

Let’s say A and B are belong the Cluster 1 and C, D and E.

As we show in below table

Now calculate the centroid of cluster 1 and 2 respectively and again calculate the closest mean until calculate when our centroid is repeated previous one.

Now find the Centroid of respected Cluster 1 and Cluster 2

New Centroid

X1 = (1, 0.5)

X2 = (1.7,3.7)

Previous Centroid

X1 = (1, 1)

X2 = (0, 2)

If New Centroid Value is equal to previous Centroid Value then our cluster is final otherwise if not equal then repeat the step until new Centroid value is equal to previous Centroid value.

So in our case new centroid value is not equal to previous centroid.

Now recalculate cluster having the closest mean.

X1 = (1, 0.5)

X2 = (1.7,3.7)

Similarly procedure as we calculate above

So based on based one, A B and C belongs to cluster 1 & D and E from cluster 2.

So mean of Cluster I and 2

X1( Cluster 1 ) = ( 0.7 , 1)

X1( Cluster 2 ) = ( 2.5 , 4.5)

New Centroid

X1 = ( 0.7 , 1)

X2 = ( 2.5 , 4.5)

Previous Centroid

X1 = (1, 0.5)

X2 = (1.7,3.7)

If New Centoid Value is equal to previous Centroid Value then our cluster is final otherwise if not equal then repeat the step until new Centroid value is equal to previous Centroid value .

So in our case new centroid value is not equal to previous centroid.

Now recalculate cluster having a closest mean similar step

So based on closest distance, A B and C belongs to cluster 1 & D and E from cluster 2.

So mean of Cluster I and 2

X1( Cluster 1 ) = ( 0.7 , 1)

X1( Cluster 2 ) = ( 2.5 , 4.5)

New Centroid

X1 = (1, 0.5)

X2 = (1.7,3.7)

Previous Centroid

X1 = (1, 0.5)

X2 = (1.7,3.7)

So here we have New Centroid values is Equal to previous value and Hence our cluster are final. A, B and C are belong to cluster 1 and D and E are belong to Cluster 2.

As shown in fig:

3. Self Organization Maps Network Architecture?

For the purposes, we’ll be discussing a two-dimensional SOM. The network is created from a 2D lattice of ‘nodes’, each of which is fully connected to the input layer. The below Figure shows a very small Kohonen network of 4 X 4 nodes connected to the input layer (shown in green) representing a two-dimensional vector.

A simple Kohonen network

Each node has a specific topological position (an x, y coordinate in the lattice) and contains a vector of weights of the same dimension as the input vectors. That is to say, if the training data consists of vectors, V, of n dimensions:

V1, V2, V3…Vn

Then each node will contain a corresponding weight vector W, of n dimensions:

W1, W2, W3…Wn

The lines connecting the nodes in the above Figure are only there to represent adjacency and do not signify a connection as normally indicated when discussing a neural network. There are no lateral connections between nodes within the lattice.

4. How Self Organization Maps work?4.1: Learning Algorithm Overview

A SOM does not need a target output to be specified unlike many other types of network. Instead, where the node weights match the input vector, that area of the lattice is selectively optimized to more closely resemble the data for the class the input vector is a member of. From an initial distribution of random weights, and over many iterations, the SOM eventually settles into a map of stable zones. Each zone is effectively a feature classifier, so you can think of the graphical output as a type of feature map of the input space.

Training occurs in several steps and over many iterations:

1. Each node’s weights are initialized.

2. A vector is chosen at random from the set of training data and presented to the lattice.

3. Every node is examined to calculate which ones weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU).

4. The radius of the neighborhood of the BMU is now calculated. This is a value that starts large, typically set to the ‘radius’ of the lattice, but diminishes each time-step. Any nodes found within this radius are deemed to be inside the BMU’s neighborhood.

5. Each neighboring node’s (the nodes found in step 4) weights are adjusted to make them more like the input vector. The closer a node is to the BMU; the more its weights get altered.

6. Repeat step 2 for N iterations.

4.2: Learning Algorithm in Details.

Now it’s time for us to learn how SOMs learn. Are you ready? Let’s begin. Right here we have a very basic self-organizing map.

Our input vectors amount to three features, and we have nine output nodes.

That being said, it might confuse you to see how this example shows three input nodes producing nine output nodes. Don’t get puzzled by that. The three input nodes represent three columns (dimensions) in the dataset, but each of these columns can contain thousands of rows. The output nodes in a SOM are always two-dimensional.

Now what we’ll do is turn this SOM into an input set that would be more familiar to you from when we discussed the supervised machine learning methods (artificial, convolutional, and recurrent neural networks) in earlier chapters.

Consider the Structure of Self Organizing which has 3 visible input nodes and 9 outputs that are connected directly to input as shown below fig.

Our input nodes values are:

Now let’s take a look at each step in detail.

Step 1: Initializing the Weights

Now, let’s take the topmost output node and focus on its connections with the input nodes. As you can see, there is a weight assigned to each of these connections.

Again, the word “weight” here carries a whole other meaning than it did with artificial and convolutional neural networks. For instance, with artificial neural networks we multiplied the input node’s value by the weight and, finally, applied an activation function. With SOMs, on the other hand, there is no activation function.

Weights are not separate from the nodes here. In a SOM, the weights belong to the output node itself. Instead of being the result of adding up the weights, the output node in a SOM contains the weights as its coordinates. Carrying these weights, it sneakily tries to find its way into the input space.

In this example, we have a 3D dataset, and each of the input nodes represents an x-coordinate. The SOM would compress these into a single output node that carries three weights. If we happen to deal with a 20-dimensional dataset, the output node, in this case, would carry 20 weight coordinates.

Each of these output nodes does not exactly become parts of the input space, but try to integrate into it nevertheless, developing imaginary places for themselves.

We have randomly initialized the values of the weights (close to 0 but not 0).

Step 2: Calculating the Best Matching Unit

The next step is to go through our dataset. For each of the rows in our dataset, we’ll try to find the node closest to it.

Say we take row number 1, and we extract its value for each of the three columns we have. We’ll then want to find which of our output nodes is closest to that row.

To determine the best matching unit, one method is to iterate through all the nodes and calculate the Euclidean distance between each node’s weight vector and the current input vector. The node with a weight vector closest to the input vector is tagged as the BMU.

The Euclidean distance is given as:

Where X is the current input vector and W is the node’s weight vector.

Let’s calculate the Best Match Unit using the Distance formula.

For 1st Nodes:

For 2nd Nodes:

For 3rd Nodes:

Similarly, way we calculate all remaining Nodes the same way as you can see below.

Since we have calculated all the values of respected Nodes. Now it’s time to calculate the Best Match Unit.

As we can see, node number 3 is the closest with a distance of 0.4. We will call this node our BMU (best-matching unit).

What happens next?

To understand this next part, we’ll need to use a larger SOM.

Supposedly you now understand what the difference is between weights in the SOM context as opposed to the one we were used to when dealing with supervised machine learning.

The red circle in the figure above represents this map’s BMU. Now, the new SOM will have to update its weights so that it is even closer to our dataset’s first row. The reason we need this is that our input nodes cannot be updated, whereas we have control over our output nodes.

In simple terms, our SOM is drawing closer to the data point by stretching the BMU towards it. The end goal is to have our map as aligned with the dataset as we see in the image on the far right

Step 3: Calculating the size of the neighborhood around the BMU

This is where things start to get more interesting! Each iteration, after the BMU has been determined, the next step is to calculate which of the other nodes are within the BMU’s neighborhood. All these nodes will have their weight vectors altered in the next step. So how do we do that? Well, it’s not too difficult… first, you calculate what the radius of the neighborhood should be and then it’s a simple application of good ol’ Pythagoras to determine if each node is within the radial distance or not.

The figure shows an example of the size of a typical neighborhood close to the commencement of training.

You can see that the neighborhood shown above is centered around the BMU (red-point) and encompasses most of the other nodes and circle show radius.

The size of the neighborhood around the BMU is decreasing with an exponential decay function. It shrinks on each iteration until reaching just the BMU

Where t = 0, 1, 2, 3….

Figure below shows how the neighborhood decreases over time after each iteration

Over time the neighborhood will shrink to the size of just one node… the BMU.

Now we know the radius, it’s a simple matter to iterate through all the nodes in the lattice to determine if they lay within the radius or not. If a node is found to be within the neighborhood then its weight vector is adjusted as follows in Step 4.

How to set the radius value in the self-organizing map?

It depends on the range and scale of your input data. If you are mean-zero standardizing your feature values, then try σ=4. If you are normalizing feature values to a range of [0, 1] then you can still try σ=4, but a value of σ=1 might be better. Remember, you have to decrease the learning rate α and the size of the neighborhood function with increasing iterations, as none of the metrics stay constant throughout the iterations in SOM.

It also depends on how large your SOM is. If it’s a 10 by 10, then use for example σ=5. Otherwise, if it’s a 100 by 100 map, use σ=50.

In unsupervised classification, σ is sometimes based on the Euclidean distance between the centroids of the first and second closest clusters.

Step 4: Adjusting the Weights

Every node within the BMU’s neighborhood (including the BMU) has its weight vector adjusted according to the following equation:

New Weights = Old Weights + Learning Rate (Input Vector — Old Weights)

W(t+1) = W(t) + L(t) ( V(t) — W(t) )

Where t represents the time-step and L is a small variable called the learning rate, which decreases with time. What this equation is sayiWhatnewly adjusted weight for the node is equal to the old weight (W), plus a fraction of the difference (L) between the old weight and the input vector (V).

So according to our example are Node 4 is Best Match Unit (as you can see in step 2) corresponding their weights:

Learning rate = 0.5

So update that weight according to the above equation

For W3,1

New Weights = Old Weights + Learning Rate (Input Vector1 — Old Weights)

New Weights = 0.39 + 0.5 (0.7–0.39)

New Weights = 0.545

For W3,2

New Weights = Old Weights + Learning Rate (Input Vector2 — Old Weights)

New Weights = 0.42 + 0.5 (0.6–0.42)

New Weights = 0.51

For W3,3

New Weights = Old Weights + Learning Rate (Input Vector3 — Old Weights)

New Weights = 0.45 + 0.5 (0.9–0.45)

New Weights = 0.675

Updated weights:

So in this way we update the weights.

The decay of the learning rate is calculated each iteration using the following equation:

As training goes on, the neighborhood gradually shrinks. At the end of the training, the neighborhoods have shrunk to zero sizes.

The influence rate shows the amount of influence a node’s distance from the BMU has on its learning. In the simplest form influence rate is equal to 1 for all the nodes close to the BMU and zero for others, but a Gaussian function is common too. Finally, from a random distribution of weights and through many iterations, SOM can arrive at a map of stable zones. In the end, interpretation of data is to be done by a human but SOM is a great technique to present the invisible patterns in the data.

Note: If you want this article check out my academia.edu profile.

5. Practical Implementation of SOMs?Fraud Detection

According to a recent report published by Markets & Markets, the Fraud Detection and Prevention Market is going to be worth USD 33.19 Billion by 2021. This is a huge industry and the demand for advanced Deep Learning skills is only going to grow. That’s why we have included this case study in this chapter.

The business challenge here is about detecting fraud in credit card applications. We will be creating a Deep Learning model for a bank and given a dataset that contains information on customers applying for an advanced credit card.

This is the data that customers provided when filling the application form. Our task is to detect potential fraud within these applications. That means that by the end of the challenge, we will come up with an explicit list of customers who potentially cheated on their applications.

Dataset

Data Set Information: This file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. This dataset is interesting because there is a good mix of attributes — continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values.

Attribute Information: There are 6 numerical and 8 categorical attributes. The labels have been changed for the convenience of the statistical algorithms. For example, attribute 4 originally had 3 labels p,g, gg and these have been changed to labels 1,2,3. A1: 0,1 CATEGORICAL (formerly: a,b) A2: continuous. A3: continuous. A4: 1,2,3 CATEGORICAL (formerly: p,g,gg) A5: 1, 2,3,4,5,6,7,8,9,10,11,12,13,14 CATEGORICAL (formerly: ff,d,i,k,j,aa,m,c,w, e, q, r,cc, x) A6: 1, 2,3, 4,5,6,7,8,9 CATEGORICAL (formerly: ff,dd,j,bb,v,n,o,h,z) A7: continuous. A8: 1, 0 CATEGORICAL (formerly: t, f) A9: 1, 0 CATEGORICAL (formerly: t, f) A10: continuous. A11: 1, 0 CATEGORICAL (formerly t, f) A12: 1, 2, 3 CATEGORICAL (formerly: s, g, p) A13: continuous. A14: continuous. A15: 1,2 class attribute (formerly: +,-)

Dataset sample:

Let’s solve the problem

Part 1: Data Preprocessing

1.1 Import the Libraries

In this step, we import three Libraries in Data Preprocessing part. A library is a tool that you can use to make a specific job. First of all, we import the numpy library used for multidimensional array then import the pandas library used to import the dataset and in last we import matplotlib library used for plotting the graph.

1.2 Import the dataset

In this step, we import the dataset to do that we use the pandas library. After import our dataset we define our dependent and independent variable. Our independent variables are 1 to 12 attributes as you can see in the sample dataset which we call ‘X’ and dependent is our last attribute which we call ‘y’ here.

Note: we will build the SOMs model which is unsupervised deep learning so we are working with independent variables.

1.3 Feature Scaling

Feature Scaling is the most important part of data preprocessing. If we see our dataset then some attribute contains information in Numeric value some value very high and some are very low if we see the age and estimated salary. This will cause some issues in our machinery model to solve that problem we set all values on the same scale there are two methods to solve that problem first one is Normalize and Second is Standard Scaler.

Here we use Normalize import from Sklearn Library.

Part 2: Building & Train our Model

In this part, we model our Self Organizing Maps model.

2.1 Import the Model

In this step, we import our SOM models which are made by other developers. Link: https://test.pypi.org/project/MiniSom/1.0/

2.2 Initialize our SOM model

In this step, we initialize our SOM model and we pass several parameters here. The first two are the dimension of our SOM map here x= 10 & y= 10 mean we take 10 by 10 grid. The third parameter is input length we have 15 different attributes in our data set columns so input_lenght=15 here. The fourth parameter is sigma is the radius of a different neighborhood in the grid so we will keep 1.0 here which is the default value for SOMs. And last past parameters are learning rate which is hyperparameter the size of how much weight is updated during each iteration so higher is learning rate the faster is conversion and we keep the default value which is 0.5 here.

2.3 Initialize the weights

In this step, we randomly initialize our weights from by using our SOM models and we pass only one parameter here which our data(X).

2.4 Train the Model

In this step we train our model here we pass two arguments here first is our data and the second is the number of iteration here we choose 100.

Part 3: Visualizing the result

In this step, we build a map of the Self Organizing Map. Firstly we import the library pylab which is used for the visualization of our result and we import different packages here. Bone is making a window then in the third line of code, we take a mean of all wining nodes. Then make of color bar which value is between 0 & 1. In Marker, we take a circle of red color which means the customer didn’t get approval and square of green color which gets which customer gets approval. Then we make a for loop (i here correspond to index vector and x corresponds to customers) and inside for loop we take a wining node of each customer and this wining node is replaced by color marker on it and w[0] (x coordinate) and w[1] (y coordinate) are two coordinate ) and then make a color of and take dependent variable which is 0 or 1 mean approval customer or didn’t get approval and take a marker value of ( o for red and s for green ) and replace it.

Here is our Self Organizing map red circle mean customer didn’t get approval and green square mean customer get approval. And if we look at our outlier then the white color area is high potential fraud which we detect here. And in the next part, we catch this cheater as you can see this both red and green.

Part 4: Catch the Potential Fraud

In this part, we catch the potential fraud of customer from the self-organizing map which we visualize in above

4.1 Mapping the wining node

In this step, we map all the wining nodes of customers from the Self Organizing Map.

4.2 Catch the cheater

In this step we catch the fraud to do that we take only those customer who potential cheat if we see in our SOM then clearly see that mapping [(7, 8), (3, 1) and (5, 1)] are potential cheat and use concatenate to concatenate of these three mapping values to put them in same one list.

4.3 Rescale the value

In this step, we convert our scale value into the original scale to do that we use the inverse function.

4.4 Final results

Then simply call frauds and you get the whole list of those customers who potential cheat the bank.

If you want dataset and code you also check my Github Profile.

End Notes

If you liked this article, be sure to click ❤ below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Github for code & dataset follow on Aacademia.edu for this article, Twitter and Email me directly or find me on LinkedIn. I’d love to hear from you.

That’s all folks, Have a nice day :)

【本文地址】

公司简介

联系我们